Neurocontrol by Reinforcement Learning

نویسنده

  • G. Schram
چکیده

Reinforcement learning (RL) is a model-free tuning and adaptation method for control of dynamic systems. Contrary to supervised learning, based usually on gradient descent techniques, RL does not require any model or sensitivity function of the process. Hence, RL can be applied to systems that are poorly understood, uncertain, nonlinear or for other reasons untractable with conventional methods. In reinforcement learning, the overall controller performance is evaluated by a scalar measure, called reinforcement. Depending on the type of the control task, reinforcement may represent an evaluation of the most recent control action or, more often, of an entire sequence of past control moves. In the latter case, the RL system learns how to predict the outcome of each individual control action. This prediction is then used to adjust the parameters of the controller. The mathematical background of RL is closely related to optimal control and dynamic programming. This paper gives a comprehensive overview of the RL methods and presents an application to the attitude control of a satellite. Some well known applications from the literature are reviewed as well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

The Importance of Clipping in Neurocontrol by Direct Gradient Descent on the Cost-to-Go Function and in Adaptive Dynamic Programming

In adaptive dynamic programming, neurocontrol and reinforcement learning, the objective is for an agent to learn to choose actions so as to minimise a total cost function. In this paper we show that when discretized time is used to model the motion of the agent, it can be very important to do “clipping” on the motion of the agent in the final time step of the trajectory. By clipping we mean tha...

متن کامل

Pii: S0378-4754(99)00117-2

A novel approach to adaptive direct neurocontrol is discussed in this paper. The objective is to construct an adaptive control scheme for unknown time-dependent nonlinear plants without using a model of the plant. The proposed approach is neural-network based and combines the self-tuning principle with reinforcement learning. The control scheme consists of a controller, a utility estimator, an ...

متن کامل

Adaptive Critic Designs - Neural Networks, IEEE Transactions on

We discuss a variety of adaptive critic designs (ACD’s) for neurocontrol. These are suitable for learning in noisy, nonlinear, and nonstationary environments. They have common roots as generalizations of dynamic programming for neural reinforcement learning approaches. Our discussion of these origins leads to an explanation of three design families: Heuristic dynamic programming (HDP), dual heu...

متن کامل

Dynamic Obstacle Avoidance by Distributed Algorithm based on Reinforcement Learning (RESEARCH NOTE)

In this paper we focus on the application of reinforcement learning to obstacle avoidance in dynamic Environments in wireless sensor networks. A distributed algorithm based on reinforcement learning is developed for sensor networks to guide mobile robot through the dynamic obstacles. The sensor network models the danger of the area under coverage as obstacles, and has the property of adoption o...

متن کامل

Learning Spatio-Temporal Planning from a Dynamic Programming Teacher: Feed-Forward Neurocontrol for Moving Obstacle Avoidance

Within a simple test-bed, application of feed-forward neurocontrol for short-term planning of robot trajectories in a dynamic environment is studied. The action network is embedded in a sensorymotoric system architecture that contains a separate world model. It is continuously fed with short-term predicted spatio-temporal obstacle trajectories, and receives robot state feedback. The action net ...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1996